Goto

Collaborating Authors

 digit classification


A Assumptions and Theoretical Results A.1 Assumptions of risk functions Definition 1

Neural Information Processing Systems

L-Lipschitz continuous gradient, if there exists a constant L > 0, such that null f (x) f (y)null Lnull x y null, x,y. If f is m-strongly convex and has an L-Lipschitz continuous gradient, then it is obvious that m L. Let λ be the Lagrange multiplier. Using Jensen's inequality, we have r We next prove the convergence of the algorithm with the proposed weight assignment rule. An edge between two agents means they are neighbors. This is to model the realistic scenario in which some of the agents may have less data samples and they may learn slowly than others.


d37eb50d868361ea729bb4147eb3c1d8-AuthorFeedback.pdf

Neural Information Processing Systems

We thank all the reviewers for their valuable comments and appreciation of the ideas and results presented in the paper. We summarize the main questions from the reviewers and address them separately below. T o Reviewer #1 Q1: Network connectivity is presumably known . . . it seems all the graphs considered are com-3 We note that the network connectivity is not assumed to be known. T o Reviewer #3 Q1: Scope of the paper/Missing related work. " and "FedNAS" are about We can add an explanation to clarify the MTL scope of the paper.


Synthetic Simplicity: Unveiling Bias in Medical Data Augmentation

arXiv.org Artificial Intelligence

Synthetic data is becoming increasingly integral in data-scarce fields such as medical imaging, serving as a substitute for real data. However, its inherent statistical characteristics can significantly impact downstream tasks, potentially compromising deployment performance. In this study, we empirically investigate this issue and uncover a critical phenomenon: downstream neural networks often exploit spurious distinctions between real and synthetic data when there is a strong correlation between the data source and the task label. This exploitation manifests as \textit{simplicity bias}, where models overly rely on superficial features rather than genuine task-related complexities. Through principled experiments, we demonstrate that the source of data (real vs.\ synthetic) can introduce spurious correlating factors leading to poor performance during deployment when the correlation is absent. We first demonstrate this vulnerability on a digit classification task, where the model spuriously utilizes the source of data instead of the digit to provide an inference. We provide further evidence of this phenomenon in a medical imaging problem related to cardiac view classification in echocardiograms, particularly distinguishing between 2-chamber and 4-chamber views. Given the increasing role of utilizing synthetic datasets, we hope that our experiments serve as effective guidelines for the utilization of synthetic datasets in model training.


A Review of Global Sensitivity Analysis Methods and a comparative case study on Digit Classification

arXiv.org Artificial Intelligence

In the era of deep learning and the rapid advancement of powerful Artificial Intelligence (AI) models, consisting of numerous layers and millions of parameters, the demand for understanding the decision-making process of black box models is on the rise. Explainable AI is a growing trend that seeks to uncover the inner workings of AI systems through computational analysis, shedding light on the decision-making process and has been applied across a variety of data types such as video [1], text [2], AIS [3] and causal [4] and genomic data [5], and applications such as art [6], medicine [7], finance [8] and education [9]. Explainability methods can be broadly divided into model agnostic or model free and model specific approaches. Model-agnostic methods can be applied to any trained machine learning model regardless of the learning mechanism and model architecture. Rule based methods [10] and sensitivity analysis are two common approaches from this category.


Digit Classification with Single-Layer Perceptron

#artificialintelligence

Generally the first thought that comes to mind when one is about to apply Supervised Learning techniques on images is to make use of Convolutional Neural Networks (CNNs). Indeed, this type of neural network is the most suitable for this type of tasks, mainly due to the reduction of dimensionality. If we imagine a dataset of images where the images have been flattened (for example, an image that is a 4x4 matrix is converted to a 16-dimensional vector, as shown in Figure 1), the images are data points in an n-dimensional space, where n is the number of pixels in the image. As can be deduced, the dimensionality of the data when we talk about images is enormous, and therefore this implies having an immense number of parameters in the neural network, which in turn leads to a higher computational cost and execution time. CNNs reduce the dimensionality of the image in each layer of the neural network, also reducing the number of parameters required in training and optimizing the performance of the model for this type of tasks.


Byzantine Resilient Distributed Multi-Task Learning

arXiv.org Machine Learning

Distributed multi-task learning provides significant advantages in multi-agent networks with heterogeneous data sources where agents aim to learn distinct but correlated models simultaneously. However, distributed algorithms for learning relatedness among tasks are not resilient in the presence of Byzantine agents. In this paper, we present an approach for Byzantine resilient distributed multi-task learning. We propose an efficient online weight assignment rule by measuring the accumulated loss using an agent's data and its neighbors' models. A small accumulated loss indicates a large similarity between the two tasks. In order to ensure the Byzantine resilience of the aggregation at a normal agent, we introduce a step for filtering out larger losses. We analyze the approach for convex models and show that normal agents converge resiliently towards their true targets. Further, an agent's learning performance using the proposed weight assignment rule is guaranteed to be at least as good as in the non-cooperative case as measured by the expected regret. Finally, we demonstrate the approach using three case studies, including regression and classification problems, and show that our method exhibits good empirical performance for non-convex models, such as convolutional neural networks.


Interpreting and Explaining Deep Neural Networks for Classification of Audio Signals

arXiv.org Artificial Intelligence

Interpretability of deep neural networks is a recently emerging area of machine learning research targeting a better understanding of how models perform feature selection and derive their classification decisions. In this paper, two neural network architectures are trained on spectrogram and raw waveform data for audio classification tasks on a newly created audio dataset and layer-wise relevance propagation (LRP), a previously proposed interpretability method, is applied to investigate the models' feature selection and decision making. It is demonstrated that the networks are highly reliant on feature marked as relevant by LRP through systematic manipulation of the input data. Our results show that by making deep audio classifiers interpretable, one can analyze and compare the properties and strategies of different models beyond classification accuracy, which potentially opens up new ways for model improvements.


Stochastic Deep Learning in Memristive Networks

arXiv.org Machine Learning

Inspired by the computational efficiency of human brain in processing unstructured data, neural networks have been explored since 1940s for a wide variety of data analytics applications. The latest generation of Deep Neural networks (DNNs) have achieved impressive successes rivaling typical human performance, thanks to their ability to capture hidden features from unstructured data using multiple layers of neurons [1]. However, as the number of layers (depth) of the networks increase, DNN training becomes computationally intense and time consuming due to the physically separated execution and memory units in conventional von Neumann machines. This has motivated the exploration of non-von Neumann architectures with closely integrated processing units and local memory elements in dense cross bar arrays with memristive devices [2]. It has been recently proposed that DNNs can be implemented by 2D cross bar arrays of resistive processing units (RPUs) that can store multiple analog states and adjust its conductivity with simple voltage pulses [3]. These RPU devices when implemented in a cross bar array can accelerate DNN training if all the weights in the array can be updated in parallel.


Tensor-Variate Restricted Boltzmann Machines

AAAI Conferences

Restricted Boltzmann Machines (RBMs) are an important class of latent variable models for representing vector data. An under-explored area is multimode data, where each data point is a matrix or a tensor. Standard RBMs applying to such data would require vectorizing matrices and tensors, thus resulting in unnecessarily high dimensionality and at the same time, destroying the inherent higher-order interaction structures. This paper introduces Tensor-variate Restricted Boltzmann Machines (TvRBMs) which generalize RBMs to capture the multiplicative interaction between data modes and the latent variables. TvRBMs are highly compact in that the number of free parameters grows only linear with the number of modes. We demonstrate the capacity of TvRBMs on three real-world applications: handwritten digit classification, face recognition and EEG-based alcoholic diagnosis. The learnt features of the model are more discriminative than the rivals, resulting in better classification performance.